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FOREWORD 


The  paper  is  a  progress  report  on  efforts  to  place 
a  particular  retrieval  system  on  a  mathematical  founda¬ 
tion.  The  work  is  not  complete  in  that  some  serious 
mathematical  problems  are  still  outstanding.  However, 
it  was  felt  desirable  to  record  accomplishments  for  his¬ 
torical  reasons  and  to  invite  critical  reviews  hopefully 
to  gain  insight  leading  to  an  improvement  of  the  mathe¬ 
matical  models  for  retrieval  systems  in  general. 

The  authors  have  made  an  effort  to  minimize  the 
formal  mathematical  exposition  in  the  interest  of  docu- 
mentalists  with  little  or  no  mathematical  training. 
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ILLUSTRATIONS 


Figure  Page  No. 

1  Precision  versus  number  of  relevant  documents  in  collec¬ 

tion.  Data  is  from  the  100  questions  based  on  source 

documents.  Precision  was  estimated  by  i  =  .E,  x4/  £  n^. 

Dots  represent  values  of  p  calculated  for  each  value  of 
r  from  the  corresponding  subset  of  observations  on  (x,n). 

The  curve  represents  the  overall  systems  estimate  for  p 
calculated  from  all  date  observed  in  the  test: 
p  =  85.7  percent.  . . 12 

2  Recall  versus  number  of  relevant  documents  in  collection. 

Data  is  from  100  questions  based  on  source  documents. 

The  x  indicate  recall  p(r)  estimated  by  the  average  re¬ 
call  ratio  X.  The  o  indicate  recall  p(r)  estimated  by 
p  =  X  (r)  p/r  where  p  is  calculated  from  all  test  data; 
p  =  85.7  percent.  . . .  .15 

3  Confidence  intervals  about  recall  p(r).  p(r)  is  estimated 

by  £(r)  =  y.  Level  of  significance  a  =  2.5  percent  .  .  .18 
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ABSTRACT 


The  report  suggests  a  method  of  constructing  a  mathematical 
model  for  the  first  test  of  the  ABC  Storage  and  Retrieval  Systems 
and  calculates  95-percent  confidence  intervals  for  relevance  and 
recall  values. 

I.  INTRODUCTION 

nils  progress  report  on  the  evaluation  of  the  first-generation 
ABC  system  is  divided  into  five  parts.  In  Part  I,  the  subject  is 
introduced , reference  is  made  to  reports  previously  published  on  the 
project,  and  the  objectives  are  outlined.  A  few  general  remarks  on 
the  application  of  probabilistic  models  to  the  performance  evalua¬ 
tion  of  retrieval  tests  are  added. 

In  Part  II,  -the  test  environment  is  described.  Part  III 
describes  the  statistical  model  on  which  the  analysis  of  test  data 
is  based.  In  Part  IV,  point  estimates  for  the  relevance  and  re¬ 
call  parameters  of  the  model  are  developed.  In  Part  V,  we  deal 
with  the  accuracy  and  reliability  of  these  estimates  and  develop 
confidence  intervals  for  recall  and  relevance  parameters  of  the 
ABC  system. 

In  the  appendices,  various  necessary  formulas  are  derived. 

A.  Background 

The  ABC  System  (ref  1),  developed  by  HDL*  to  effect  efficient 
storage  and  retrieval  of  scientific  and  technical  information,  has 
been  subjected  to  an  extensive  performance  test.  The  first  report 
in  this  series  (ref  2)  described  the  test  program  and  setup  as 
well  as  methodology,  and  the  second  report  (ref  3)  gave  a  pre¬ 
liminary  statistical  evaluation  of  test  results.  11118  third  report 
of  the  series  discusses  a  statistical  model  of  the  retrieval  pro¬ 
cess,  developed  from  a  set  of  definitions  and  assumptions  that 
were  not  refuted  by  experimental  evidence.  The  model  permits  a 
rigorous  analysis  of  the  test  data  Including  the  approximation  of 
95-percent  confidence  intervals  for  the  relevance  and  recall  per¬ 
formance  of  the  system. 


* 

The  Army  Research  Office,  Scientific  and  Technical  Information 
Division,  Washington,  D.  C.,  supports  the  development  of  the  ABC 
System. 
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B,  Why  a  Statistical,  Probabilistic  Model? 


There  is  a  growing  literature*  on  the  use  of  mathematical 
models  for  the  description  and  analysis  of  information  storage  and 
retrieval  systems.  So  far,  the  majority  of  the  models  suggested 
have  been  deterministic  in  that  they  allow  one  to  predict  the  out¬ 
come  of  a  retrieval  experiment  with  certainty.  Ihese  models  make 
use  of  the  fact  that  certain  aspects  of  information  systems** ***  can 
be  "mapped"  into,  or  represented  by,  various  abstract  mathematical 
structures  bt3ed  on  Boolean  algebras,  topology,  lattice  theory, 
set  theory,  and  related  descriptions. 

Probabilistic  models,  on  the  other  hand,  have  been  used  to 
study  phenomena  in  information  systems  that  obey  statistical  or 
probabilistic  rather  than  deterministic  laws.  Among  such  phenomena, 
one  might  mention  the  number  of  monthly  accessions  of  a  collection, 
or  the  number  of  users  in  a  given  period  of  time.  Obviously,  the 
outcome  of  any  particular  retrieval  run  observed  in  our  test  (say 
four  documents  retrieved,  three  of  which  are  relevant)  also  obeys 
probabilistic  laws  in  a  sense  that  it  might  not  have  been  predicted 
with  certainty. 

One  of  the  major  purposes  of  this  report  is  to  study  these 
probabilistic  laws,  and  to  specify  them.  The  process  of  specifi¬ 
cation  or  choice  of  a  particular  statistical  model  requires  ac¬ 
ceptance  of  a  few  simplifying  assumptions  leading  to  the  specific 
probabilistic  equations  of  the  model.  Once  the  model  equations 
have  been  formulated,  estimators  for  the  performance  measures 
relevance  and  recall,  can  be  derived  and,  subsequeetly,  the  accuracy 
of  these  estimators  is  determined  in  terms  of  confidence  intervals. 

II.  TEST  ENVIRONMENT 

The  test  collection  consisted  of  approximately  3650  documents 
(journal  articles  and  reports)  on  solid-state  circuits,  devices, 
and  applications.  These  covered  the  1959-1964  period.  The  sub¬ 
ject  area  was  sufficiently  small  to  permit  comprehensive  coverage 
of  the  open  literature. 


*An  annotated  bibliography  is  given  at  the  end  of  the  report. 

♦•In  particular,  systems  using  computers  for  storage  and/or 
retrieval. 

***For  definition,  see  Pert  II  and  III. 
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■Hie  main  objective  was  to  test  the  effectiveness  of  the  ABC 
dictionary  (a  KWIC-type  index  of  concise,  informative,  and  indica¬ 
tive  document  descriptions  in  natural  English).  Two  (one  long, 
one  short)  dictionaries  were  used  in  the  test.  For  the  print-out 
of  the  long  version,  250  terms  were  excluded  from  the  permutation; 
in  the  short  version,  this  list  of  terms  was  extended  to  Include 
numerous  non informative  terms;  e.g,,  improvement  and  development. 

A  total  of  136  questions  were  used  for  the  test.  Thesr  con¬ 
sisted  of  100  questions  based  on  sol' ce  documents  and  36  questions 
formulated  by  scientists  with  a  broad  overall  knowledge  of  the 
contents  of  the  collection. 

Each  retrieval  run  consisted  in  a  single  search  performed  by 
a  test  operator  in  one  of  the  ABC  dictionaries  and  resulted  in  a 
list  of  document  descriptions.  After  the  search  was  completed, 
the  documents  corresponding  to  the  selected  descriptions  were 
analyzed  by  independent  umpires  for  relevance  to  the  question  a 
hand,  and  the  proportion  of  relevant  documents  could  then  be  de¬ 
termined. 

On  the  average,  each  document  description  was  accessible 
(in  other  words,  was  permuted  in  the  KYXC  arrangement)  under  about 
four  to  five  index  terms. 

Hie  average  number  of  relevant  documents  per  question  was 
determined  to  be  8,20  for  the  36  general  questions  and  8.60  for 
the  100  source-document  questions.  (The  procedure  for  assessing 
relevance  is  described  in  reference  3,  page  6.) 

Detailed  information  regarding  test  design  and  procedure  are 
given  in  references  2  and  3. 

111.  DESCRIPTION  OF  THE  STATISTICAL  MODCL 

A,  Notation  and  Basic  Definitions 


The  following  letter  symbols  are  used  to  denote  the  basic 

parameters  and  variables  of  the  test, 

N  *  number  of  documents  available  for  retrieval  in  answer  to  a 
question 

r  m  number  of  documents  in  the  collection  relevant  to  a  given 

question  (or  a  given  set  of  questions,  If  explicitly  stated) 

x  ■  number  of  documents  retrieved  in  a  retrieval  run  (by  a 

retrieval  operator  in  response  to  a  given  question)  relevant 
to  the  question 
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y  =  number  of  documents  retrieved  in  a  retrieval  run  not  rel¬ 
evant  to  the  question  at  hand 

n=at+yss  number  of  documents  retrieved  in  a  retrieval  run.  Each  re¬ 
trieval  run  (or  inquiry)  consists  of  n  trials;  i.e.,  each  document 
retrieved  in  answer  to  a  question  is  a  trial 

x/n  =  relevance  ratio  for  a  retrieval  run 

x/r  a  recall  ratio  for  a  retrieval  run 

For  the  analysis  of  the  test  results,  N  and  r  are  known  parameters, 
and  x,  y,  and  n  are  random  variables  observed  in  the  test. 

B.  Basic  Assumptions  and  Model  Equations 

Probability  Laws  for  x  and  y 

For  a  retrieval  run  yielding  n  =  x  +  y  retrieved  documents,  it 
was  assumed  that  x  and  y  are  independent  and  Poisson  distributed; 

therefore, 

g(x,p)  =  ^  (!) 


«<y,v>  -  (2) 

/• 

where  g(x,u)  denotes  the  probability  mass  function  (peif)  of  x,  and 
g(y,v),  the  pmf  of  y.  For  a  given  x,  g(x,u)ls  the  probability  of 
retrieving  exactly  x  relevant  documents,  and  u  is  the  expected 
value,  or  population  mean,  of  x.  A  necessary  condition  for  the 
Independence  of  x  and  y  is*  that  the  retrieval  of  a  relevant  docu¬ 
ment  at  any  particular  trial  is  Independent  of  tho  results  at  pre¬ 
ceding  trials.  This  condition  was  provided  for  to  some  extent  by 
the  test  design  in  as  such  as  relevance  was  Judged  by  Independent 
umpires  after  the  retrieval  run  waa  completed. 

The  assumption  that  x  and  y  are  Poisson  distributed  was  made 
becauae  the  Poisson  distribution  is  often  applied  to  the  analysis 
of  rare  events  involving  chance  processes,  which  the  retrieval  pro¬ 
cess  is  considered  to  be:  the  retrieval  of  a  very  few  documents 


*It  can  be  shewn  that  x  and  y  are  Independent  if  and  only  if  n  is 
Poisson  distributed 
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from  a  relatively  large  collection  and  whether  a  trial  is  relevant 
or  not  being  a  matter  of  chance.  A  more  detailed  presentation  of 
rationale  is  presented  in  Appendix  A. 

X2  tests*  were  performed  on  numerous  samples  to  check  if  the 
retrieval  data  fitted  a  Poisson  distribution,  and  no  sample  tested 
gave  evidence  to  refute  the  assumption. 

Probability  Law  of  x  given  n 

Given  that  x  and  y  are  independently  distributed  according  to 
equations  1  and  2,  the  pmf  of  x,  given  n  documents  have  been  re¬ 
trieved,  can  be  derived  (Appendix  B),  and  is  found  to  be  binomial: 

f(x|n)  ^  (JJ)pT(l-p)n_X  (3) 

The  expected  value**  of  x  given  n,  which  is  denoted  E[x|n],  is 

E[x|n]  =  np 

For  any  fixed  value  of  x,  say  x  =  k,  f(kjn)  is  the  probability  of 
finding  exactly  k  relevant  documents  among  n  documents  retrioved. 
In  a  first  approximation,  we  can  assume  n  £  r  for  our  tests***. 

The  parameter  p  can  be  interpreted  as  the  probability  that 
a  document  is  relevant  given  that  it  is  retrieved:  therefore,  p 
represents  the  precision  (relevance)  of  the  ABC  system. 

Probability  Law  for  n 

In  a  similar  fashion,  the  pmf  of  n,  say  g(n),  can  be  derived 
from  (1)  and  (2)  assuming  the  independence  of  x  and  y;  n  Is  Mao 
found  to  follow  a  Poisson  distribution: 

-X  n 

g(n)  *  (4) 

n* 

with  mean  K[ n  j  *  X;  X  *  u  ♦  v 


*A  common  statistical  method  to  test  the  goodness  of  fit  of  pro¬ 
bability  distribution. 

**The  expected  value  or  "population  mean"  is  the  mean  of  the  phe¬ 
nomenon  being  observed,  weighted  by  its  probability  distribution. 

••• 

For  the  date  analysed,  n  <■  r  was  found  to  hold  for  98  percent  of 
s)l  runs. 

See  Appendix  B  for  derivation  of  g(n). 


X,  the  expected  number  of  documents  retrieved,,  car.  be  regarded  as 
a  measure  of  the  retrieval  effort.  Test  results  show  X  to  be  de¬ 
pendent  on  r;  therefore,  we  will  always  write  X  -  \(r). 

For  any  <*  xed  value  of  n,  say  n  =  m,  g(m)  specifies  the  pro¬ 
bability  tLut  exactly  m  documents  will  be  retrieved. 

The  validity  of  (4)  was  checked  for  various  representative 
san-p'.^s  of  the  retrieval  data,  using  again  xS  methods  to  test  the 
goodness  of  fit.  Results  show  (4)  to  be  a  good  approximation  and 
also  confirmed  X  to  be  .apendent  on  r. 

Fo  far,  we  have  specified  three  basic  probability  laws,  (1), 
(2),  and  (4)  that  allow  us  co  predict,  in  terms  of  probanilities, 
the  outcome  of  observations  of  our  test  variables  x,y,  and  n  alone. 
Furthermore,  (3)  allows  us  to  predict  a  value  of  x,  given  that  n 
has  been  observed  before. 

Now,  we  would  like  also  to  find  the  probability  to  observe 
«ny  fixed  value  of  x  and  a  fixed  value  of  n  jointly. 

According  tc  statistical  theory,  the  probability  law  for  the 
jod  i+  distribution  of  x  and  n  is  obtained  as  the  product  of  the 
praf  of  n  and  the  pmf  of  x,  given  n: 

h(x,n,p)  =  f (x,p| n) •  g(n)  (5) 

Using  (3)  and  (4),  we  have 

h(x,n, p)  =  <”)  p*(l-p)<"",c)  (6) 

The  derivation  of  (6)  completes  our  set  of  probability  laws,  which 
constitute  the  statistical  model.  The  reader  should  recognize  that 
the  preceding  section  and  the  appendixes  are  of  crucial  importance 
in  establishing  correspondence  between  the  physical  world,  i.e., 
the  test  data,  and  the  mathematicM  mouel  for  this  data.  From 
here  on,  most  re'  Its  will  be  obtained  by  way  of  analytical  deri¬ 
vation  with  little  or  no  additional  information  about  the  physical 
world  needed;  therefore,  the  usefulness  of  the  results  derived 
will  depend  almost  exclusively  or.  the  Soundness  of  human  judgment 
leading  to  the  model  assumptions  and,  consequently,  tc  the  for¬ 
mulation  of  the  probabilistic  laws.  In  tne  next  part,  we  will  be 
concerned  with  the  derivation  of  formulas  that  can  be  used  to 
estimate  c,.e  parameters  of  the  test,  in  particular,  precision  and 
recall,  from  observed  sample  values. 
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IV.  ESTIMATION  OP  PRECISION  AND  RECALL 


A.  Estimation  of  Precision  p  and  X(r),  the  Expected  Number  of 
Documents  Retrieved 

The  method  of  maximum  likelihood*  was  used  to  estimate  the 
precision  parameter  p  and  the  retrieval  effort  X(r)  based  on  a 
series  of  k  retrieval  runs**. 

Using  the  hat  symbol  (A)  to  denote  estimators,  we  find 

k 

$  =  (7) 


Precision  is  estimated  from  a  series  of  k  observations  not  by 
averaging  the  relevance  ratios  Jx^yn^),  but  by  averaging  Xj  and  nA 

separately. 

In  similar  fashion,  an  estimator  for  X(r)  is  derived: 

k 

.E.ni 

*<r)=  (8) 

The  expected  value  of  n  for  a  single  run  is  estimated  from  k  ob¬ 
servations  of  n  as  the  arithmetic  mean.  Since  experimental  evi¬ 
dence  showed  X  to  be  dependent  on  r,  the  estimation  of  this 
parameter  will  be  based  on  samples  of  n  with  constant  r.  In  other 
words,  the  model  appears  to  be  applicable  only  to  subsets  of 
data.  The  same  is  true  of  recall  ratio,  which  is  discussed  later. 

Precision  p  was  introduced  in  (3)  as  a  basic  parameter  of  the 
model..  There  core  p  should  be  independent  of  r;  (7)  can  be  used 
to  check  the  validity  of  this  statement.  For  this  purpose,  we 
ranked  the  100  questions  based  on  source  documents  into  24  groups 
corresponding  to  24  values  of  r  asserted  for  them,  Then  for  each 
value  of  r,  p  was  calculated  using  (7),  Also,  a  value  for  p  was 
obtained  by  extending  the  summations  in  (7)  over  all  data  observed 
in  the  test.***  The  results  are  shown  in  figure  1. 


Described  in  many  textbooks  on  statistics. 
See  Appendix  C  for  complete  derivation. 


Including  more  than  1000  retrieval  runs. 
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tlons  on  (x,n).  'Die  solid  line  represents  the  overall  systems 
estimate  for  p  calculated  from  all  data  observed  in  the  test: 
p  =  85.7  percent. 


Obviously,  p  is  fairly  independent  of  r  in  this  first  approxima¬ 
tion,  although  a  slight  tendency  for  p  to  increase  with  r  should 
not  be  overlooked. 

B.  Estimation  of  Recall 

The  precision  parameter  p  appeared  in  our  model  as  the  parameter 
of  the  binomial  probability  law  that  governs  the  frequencies  of 
observing  any  particular  number  of  relevant  documents,  given  that  n 
documents  have  been  retrieved.  With  recall,  the  situation  is  dif¬ 
ferent.  Since  our  basic  model  equations  (1)  and  (2)  do  not  con¬ 
tain  r  explicitly,  the  model  cannot  contain  parameters  that  could 
be  used  to  represent  recall.*  Therefore,  we  will  introduce  a  re¬ 
call  parameter  p(r)  by  definition  to  be  the  expected  value  of  the 
average  recall  ratio  observed  in  the  test. 

Assume  we  observe  the  recall  ratio  x/r  for  a  series  of  k 
retrieval  runs  with  constant  r: 

Xi/r,  Xg/r,  ...,  xk/r 


Now,  let 


,  -  isri 

Vi  =  Xj/r;  V  =  -5 - 


i=lXi 


Then,  we  define  a  recall  parameter  p(r)  as  the  expected  value  of 
the  average  recall  ratio  7 ' 

P(r)  E[y]  (9) 


★The  a  model  discussed  in  Appendix  E  of  this  report  uses  a  different 
basic  approach  and  contains  a  recall  parameter  a  but  no  precision 
parameter. 
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Using  the  method  of  the  moment  generating  function,  E[y]  can  be  de 
rived  (Appendix  D,  Section  2)  and  is  found  to  be: 


Ifi]  .  ^ 


(10) 


p(r)  can  be  estimated  using  our  previously  derived  estimates  i(r) 
and  p  to  have 


p(r)  .  ISdJ 


(11) 


Using  the  summations  in  (7)  and  (8),  we  obtain 

k  k 


p(r)  = 


i  iif» 


i9» 


E  ru 
1=1  1 


Since  summations  of  n^  are  over  the  same  seta  (r  =  constant) 
of  data,  they  cancel  each  other  and  we  obtain, 

k 

i&i  - 


p(r) 


rk 


=  Y 


(12) 


For  a  series  of  k  retrieval  runs  with  constant  r,  the  recall  param¬ 
eter  p(r)  can  be  estimated  to  a  first  approximation  by  the  average 
recall  ratio  y  obseived  for  the  k  runs.  In  figure  2,  values  of 
p(r)  obtained  from  (12)  have  been  plotted  versus  r.  In  the  same 
diagram,  a  second  set  of  values  for  p(r)  is  plotted,  which  was 
obtained  from  the  general  formula  (11),  using  the  overall  systems 
estimate  for  p  calculated  from  all  test  data.  From  the  agreement 
between  the  two  sets  of  values,  it  is  evident  the  choice  of  p— either 
from  subsets  of  data  with  constant  r  or  from  all  data—has  little 
influence  on  the  estimation  of  recall.  The  dependence  of  recall 
on  r  in  figure  2  probably  stems  from  the  relatively  low  retrieval 
effort  ^(r),  which  varied  between  1  and  4  in  the  test.  In  the 
second  report  of  this  series  (ref  3),  it  is  shown  that  in  the 
majority  of  cases,  the  observed  average  recall  ratios  were  less 
than  10  percent  below  the  optimum  obtainable  for  a  given  n  and  r. 

We  have  now  derived  estimates  for  precision  and  recall  based 
on  observations  from  the  test.  Also,  an  estimate  for  the  re¬ 
trieval  effort  A (r)  was  obtained.  We  found  evidence  for  our  as¬ 
sumption  that  the  precision  p  is  a  true  systems  parameter  of  rele¬ 
vance;  however,  recall  can  be  estimated  only  for  subsets  of  retrieval 
data  with  constant  r.  In  Part  V,  confidence  intervals  will  be 
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established  for  both  relevance  and  recall.  The  purpose  is  here  to 
determine  accuracy  or  reliability  of  our  estimates  derived  so  far, 

V.  DETERMINATION  OF  CONFIDENCE  INTERVALS 

While  a  detailed  discussion  of  the  significance  of  confidence 
intervals  would  be  beyond  the  scope  of  this  report,  a  few  explanatory 
remarks  seem  appropriate  for  the  nonstatistically  minded  reader. 
Assume  a  statistical  parameter  p  to  be  estimated  by  calculating  p 
for  each  of  a  large  number  of  samples.  Then  for  each  sample,  a  con¬ 
fidence  interval  is  determined  so  that  p  is  contained  within  the 
intervals  for  a  high  proportion  of  samples,  while  it  may  fall  out¬ 
side  the  interval  in  a  few  cases.  The  small  probability  that  p 
will  fall  outside  the  upper  or  lower  limits  of  the  intervals  has 
to  be  preselected  before  the  limits  are  calculated,  and  is  called 
level  of  significance.  Por  this  report,  the  level  of  significance, 
often  denoted  a,  is  chosen  to  be  0.325.  The  value  of  2a  can  be 
thought  of  as  the  probability  of  making  an  error  in  claiming  that 
the  calculated  limits  include  p. 

In  symbolic  notation,  using  p  to  denote  the  lower  limit  and  p  to 
denote  the  upper  limit  and  P  to  denote  the  probability  for  the 
statement  in  brackets  to  be  true 

P(p  <p  <  p]  =  95  perce  (13) 

Equation  (13)  is  equivalent  to  the  proposition:  the  true  value  of 
the  parameter  p  will  be  contained  in  the  limits  as  determined  from 
samples  in  95  percent  of  all  cases,  or,  in  roughly  95  percent  of  a 
limited  number  of  samples  analyzed.  In  a  very  loose  sense  con¬ 
fidence  intervals  may  be  likened  to  error  bounds  about  the  true 
value  of  the  parameter. 

If  the  preselected  error  probability  7Ct  is  small,  the  investi¬ 
gator  may  have  a  high  degree  of  confidence  in  his  assumption  that 
the  true  value  of  the  parameter  estimated  will  fall  within  the 
limits  of  the  error  bands. 

After  these  general  remarks,  we  will  give  confidence  bands 
about  o(r)  and  p. 

A.  Confidence  Intervals  about  Recall;  p(r) 

In  Appendix  D,  it  is  shown  that  ^  has  a  Poisson  distribution 
with  variable  nk^  and  mean  kX(r)p 

If  we  define: 

X'(r)  ■  kX(r)p,  (14) 


16 


We  can  use  statistical  tables  (ref  7)  for  confidence  intervals 
about  the  mean  of  a  Poisson  distribution  to  find  upper  and  lower 
limits  for  The  corresponding  intervals  about  p(r)  =  X(r)p/r 

may  then  be  obtained  using  the  identity 


X(r)p 


X'(r) 

kr 


(15) 


and  we  can  write 


0 


X  (r) 


kr 


<  p(r)  < 


X'(r) 

kr 


]-»• 


95 


(16) 


for  a  level  of  significance  <3  =  0,025, 

In  figure  3,  confidence  intervals  about  p(r),  estimated  by  p(r),  are 
presented  as  vertical  lines  for  each  sample  of  data  with  fixed  r. 

Hie  number  of  retrieval  runs  is  shown  under  each  value  of  r,  for 
r  =  1  to  10.  Some  comment  is  needed  for  the  sample  with  r  =  1 ; 
here,  the  upper  limit  on  p(r)  was  found  to  be  larger  than  100  percen 
The  reason  is  that  in  8  out  of  59  valid  retrieval  runs  involved,  n 
exceeded  r,  which  disagrees  with  our  original  assumption  made  for 
the  model. 

B.  Confidence  Intervals  about  Precision  p 


Since  the  distribution  of  the  average  relevance  ratio  for  k 
runs  is  not  known,  intervals  about  p  have  been  determined  using 
the  upper  and  lower  limits  for  X#(r)  tabulated  (ref  7).  By  de¬ 
finition  (14)  we  have 


X'(r) 
p  "  kX(r) 


(17) 


and  using  l(r)  to  estimate  the  unknown  parameter  X(r)  in  (17), 
we  get  an  estimator  for  p: 


A 

P  ° 


X'(r) 
kX  (r) 


(18) 


Prom  (18),  upper  and  lower  limits  on  p  can  be  obtained  as 


-  kt(r) 


(19) 
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and 


A 

p  = 


X/(r) 

kX(r) 


(20) 


For  several  samples,  upper  limits  on  p  determined  from  (20)  ex¬ 
ceeded  100  percent.  This  seeming  inconsistency  is,  however, 
easily  explained: 

(1)  the  relevance  as  estimated  from  these  samples  was 

90  percent ; 

A 

(2)  in  (20),  the  unknown  X(r)  had  to  be  estimated  by  X(r). 

From  (1)  and  (2),  it  follows  that  if^  (r)  underestimates  X(r)  by  more 
than  10  percent,  p  will  exceed  100  percent. 

There  is,  however,  a  way  of  avoiding  this  problem.  Since  we  are 
practically  interested  here  in  a  lower  limit  only,  which  will 
enable  us  to  say  that,  based  on  a  given  sample  of  data  and  a  level 
of  significance  a.  the  true  systems  relevance  p  is  better  than 
some  lower  limit  jp,  we  can  use  a  different  method,  to  determine  a 
one-sided  confidence  interval  about  p.  This  would  correspond  to  a 
probability  statement 


P{p  >  i>]  *  1  -  3a  «  95  percent 


(21) 


which  is,  for  a  »  0.025,  equivalent  to  saying:  the  probability 
that  precision  p  ia  greater  than  p  lower  limit  £  is  equal  to  95 
percent. 

Lower  limits  for  p  determined  in  this  fashion  and  the  cor¬ 
responding  vi lues  of  p  *  Z  *t/£nj  are  given  in  table  I,  for  each  sub¬ 
set  of  retrieval  runs  with  a  given  value  of  r,  for  m,  2, _ ,  18. 

For  r  >  18,  the  samples  were  too  small  to  allow  the  determination 
of  meanlngfu.  limits. 

Table  I.  One-Sided  Confidence  Intervals  on  Precision  p.  Estimated 
from  Samples  of  Data  with  Constant  r 


r 

i 

2 

3 

4 

5 

6 

LU 

8 

9 

JO 

11 

12 

13 

14 

15 

16 

17 

18 

ki) 

77 

63 

100 

78 

71 

87 

a 

90 

92 

68 

83 

75 

92 

89 

100 

90 

100 

100 

j*<) 

59 

52 

79 

61 

54 

73 

i«i 

78 

74 

63 

55 

58 

71 

76 

75 

69 

77 

73 
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An  example  may  serve  to  explain  how  the  data  from  table  I 
can  be  used  to  Interpret  our  test  results:  Prom  the  group  of 
retrieval  runs  performed  for  all  document-based  questions  with 
r  ■  8,  we  estimate  the  precision  to  be  90  percent.  Based  on 
this  sample  of  data,  we  are  also  confident*  to  say  that  the  true 
precision  p  is  better  than  78  percent.  Going  to  the  next  sample 
with  r  *  9,  we  would  estimate  p  to  be  92  percent,  with  a  lower 
limit  of  74  percent.  The  smaller  sample  size  for  r  =  9  causes 
a  decrease  in  the  lower  limit  for  p,  since  k,  the  number  of  trials, 
appears  In  the  denominator  of  the  formula  (19)  for  p.  Results  for 
the  other  samples  should  be  interpreted  in  the  same  way.  For  the 
majority  of  samples,  the  estimate  for  p  is  better  than  80  percent, 
and  the  lower  limit  better  than  70  percent.  This  result  should 
allow  an  evaluator  of  ABC  systems  performance  to  be  confident 
that  the  precision  cf  the  system  under  test  conditions  is  better 
than  70  percent.  Unfortunately,  the  lower  limits  on  p  had  to  be 
determined  for  small  samples  from  subsets  of  data  with  constant  r, 
since  the  quantity  £(r)  that  depends  on  r,  was  involved  in  the  cal¬ 
culating.  If  precision  estimated  from  all  test  data  (p  ~  85.7 
percent)  would  have  served  as  a  basis  for  a  determination  of  p, 
the  lower  limit  would  have  been  better,  maybe  close  to  80  percent. 

C.  SUMIAKY 

Ve  can  summarize  the  results  of  our  discussion  of  the  errors 
involved  in  the  estimation  of  relevance  and  recall  from  our  test 
data  in  the  following  statements: 

(1)  Recall,  defined  as  the  expected  value  of  the  average 
recall  ratio,  is  high  when  estimated  frem  samples  of  data  with  a 
low  value  of  r.  For  example,  retrieval  runs  (about  80  runs  in  the 
test)  for  which  there  was  only  one  relevant  document  in  the  col¬ 
lection  (r  »  1)  yielded  an  estimated  recall  of  more  than  80  percent. 

When  estimated  from  samples  with  somewhat  larger  values  of  r, 
say  r  -  2t5,  recall  decreases  to  33  percent;  the  corresponding 
lower  limits  decrease  from  about  80  to  20  percent.  At  still  higher 
values  of  r  up  to  r  *  10,  recall  shows  a  further  decrease, — though 
at  a  much  slower  rate —  to  about  17  percent  with  a  lc?cr  limit  of 
15  percent.  The  reader  should  remember  that  the  model  estimates 
recall  simply  as  the  average  recall  ratio  observed;  therefore,  the 
estimates  given  here  represent  data  observed  in  the  test.  It 
should  also  be  mentioned  at  this  point  (see  reference  3  for  de- 
tailed  discussion)  that  the  actual  average  recall  ratios  observed 

• 

The  degree  of  this  subjective  confidence  esn  be  specified  by  re¬ 
lating  it  to  the  numerical  value  of  the  error  probability 
20*  5  percent,  in  (21). 
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in  the  test  were  only  a  few  percent  short  of  the  optimal  recall 
ratios  obtainable  for  the  observed  number  of  documents  retrieved.* 

(2)  The  model  allows  systems  relevance  p  to  be  estimated 
effectively  both  from  all  data  as  well  as  from  subsets  with  con¬ 
stant  r;  to  a  first  approximation,  the  estimates  of  p  obtained 
from  subsets  of  data  with  constant  r  appear  to  be  independent  of 
r.  For  a  level  of  significance  of  a  =  2.5  percent,  the  lower 
limits  on  p,  as  determined  from  a  one-sided  interval,  are  greater 
than  50  percent  for  all  samples  with  a  constant  r,  r  =  1  to  18, 
the  majority  being  greater  than  70  percent.  On  the  basis  of  all 
test  data  (fig.  1),  we  estimate  the  ABC  systems  relevance  as  p=85.7 
percent.  Based  on  subsets  of  test  data  with  constant  r(r=l,2, . . .18), 
we  find  63  <  p  <  100  percent,  and  for  the  majority  of  samples, 
p  >  80  percent. 

Evidently,  the  model  is  successful  in  as  much  as  a  basic 
assumption,  the  existence  of  a  constant  precision  parameter  p,  is 
confirmed  by  the  test  data. 

lhe  soundness  of  the  model  is  further  corroborated  by  the  fact 
that  the  second  assumption,  the  equivalence  of  the  retrieval  pro¬ 
cess  under  test  conditions  with  a  Poisson  process,  leads  to  pro¬ 
bability  laws  for  the  variables  x,  y  and  n  that  are  closely  ap¬ 
proximated  by  the  observed  data. 


* 

If  n  documents  are  retrieved  in  a  particular  run,  the  optimal 
recall  ratio  obtainable  is  evidently  n/r,  e.g.  all  documents 
retrieved  are  relevant. 
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APFBNDIX  A.— -Rationale  for  the  Assumption  of  a  jfroisson  Distribution 
for  X 

(1)  The  Poisson  Process  and  the  Poisson  Distribution 


This  development  follows  closely  that  given  in  Cox  and 
Miller  (pp  146  ff). 

A  Poisson  process  is  a  point  process  on  the  real  axis. 

We  let  N(t,t+At)  be  the  number  of  these  events  that  occur  in  the 
interval  between  t  and  t+At.  The  length,  At,  of  the  interval  is 
assumed  to  be  small.  Let  p  be  a  positive  constant.  We  further 
assume 

Prob  ^N(t,  t+At)  =  0  j  s  1  -pAt  +  o(At) 

Prob  [»(t,  t+At)  =  1  j  =  p  At  +  o(At) 

so  that 

Prob  [n(t,  t+At)  *  2  ]  =  o(At) 

also  N(0,  t)  and  N(t,  t+At)  are  independent.  o(At)  is  a  number 
smaller  than  At.  For  most  practical  purposes  we  may  assume  it 
to  be  zeroc  Roughly  speaking  we  have  assumed:  the  probability  of 
one  event  occurring  in  a  small  interval  At  is  proportional  to  the 
length  of  the  interval;  the  probability  of  more  than  one  event 
occurring  in  the  interval  is  zero;  on  the  average  p  events  will 
occur  per  unit  of  measurement ;  the  occurrence  of  an  event  in  one 
interval  does  not  affect  the  occurrence  in  another  disjoint 
interval.  This  is  the  Poisson  process. 

Jtet  N(t)  =  N  (0  ,  t)  =  number  of  events  that  occur  in  the  interval 
(0,  t)  and  let  Pi(t)  =  Prob  [N(t)  =  i],  i  =  0,  1,  ... 

pA(t  +  At)  =  Prob  {N(t+At)  =  i]  = 

Prob  {NCt)  =  i  and  N(t,  t+At)  =  0} 

+  j*rob  fN(t)  a  i-1  and  N(t,  t+At)  =  l] 

4-  £  Prob  {K(t)  =  i-k  and  N(t,  t+At)  =  k  } 

kis2 

"  Pi(t>(l-pAt+  o(At))  +  p1„1(t)(pAt  +  o(At)) 
i 

+  £  o  (At) 

k=2 
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By  the  independence  of  "n"  on  nonoverlapping  increments,  thus  letting 
p_j (t)  be  zero,  we  have  the  formula 
p  (t  +  At)  *s-p^(t)(l-pAt)  +  p^_j  (t)  At  At  +  o(At) 
which  yields  the  differential  equations 

Pj/(t)  =  -pp^(t)  +  Ppi-i  (t)  and  the  boundary  conditions 

Pq(0)  =  1,  Pi(0)  a  O  i  =  1,  2,  ... 

Consider  now  the  generating  function 

00 

G(Z,  t)  =  £  pi(t>  Zi 

issO 

•  SB 

ff  -  £  Pi<«  zl  *  J,  tn(t)+  pPl-l<t>]  zl 

i=0  i=0 

=  -pG(Z,  t)  +  pZ  G(Z,  t)  =  p(Z-l)  G(Z,  t) 

the:. 

G(Z,t)  =  A(Z)e~pt  +  ptZ 

Since  G(Z,0)  =^pi(0)Zi  =  1 

by  the  boundry" conditions,  A(Z)  =  1 
and 

Prob  j^N(t)  a  ij  « 


i! 

Hence  N(t)  =  number  of  events  that  occur  in  the  interval  (0,  t) 
has  a  Poisson  distribution  with  parameter  pt 

(2)  Hie  Retrieval  Process 

■  IB  i  I  I  II  II  — -I  IMIIIIMMMM  ■  ■■  I  ■  I  If** 

Through  the  arrangement  of  document  descriptions  in  the 
ADC  index,  any  given  question  will  guide  the  searcher  to  certain 
subsets  of  descriptions. 


Pi«t)  G(Z,  t) 

e  pt(pt)i 


Z=0 
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In  a  first  approximation,  reference  is  provided  from  the 
keywords  of  the  question  to  the  corresponding  cluster  of  descrip¬ 
tions  permutated  under  those  keywords  or  under  synonymous  terms. 
These  subsets  of  descriptions  may  contain,  in  typical  cases, 
between  20  and  about  100  items,  with  a  few  descript  .ons  pertaining 
to  relevant  documents  interspersed.  The  searcher  will  then  scan 
through  the  subset  of  descriptions  and  select  those  that  seem  to 
pertain  to  relevant  documents.  After  the  search  is  complete,  the 
documents  pertaining  to  the  selected  descriptions  are  evaluated 
with  regard  to  their  relevance  to  the  inquiry  (ref  3,  page  6).  The 
total  set  of  descriptions  selected  is  then  divided  into  a  subset 
pertaining  to  relevant  documents  and  a  subset  pertaining  to  non- 
relevant  documents. 


(3)  Retrieval  as  a  Poisson  Process 


A  correspondence  between  the  mathematical  process  out¬ 
lined  in  (1)  and  the  physical  process  outlined  in  (2)  can  now  be 
established.  We  shall  first  discuss  the  retrieval  of  relevant 
documents.  The  subset  of  descriptions  scanned  corresponds  to  the 
real  axis.  The  selection  of  a  relevant  description*  is  a  point 
event  on  this  axis  and  N(t,t+At)  the  number  of  descriptions  se¬ 
lected  in  an  interval  At. 

Our  At  consists  of  one  description.  Hence  N(t,t+At)  can 
be  either  zero  or  one.  If  we  now  say  that  there  is  some  rate  px 
at  which  the  relevant  descriptions  will  be  selected,  the  correspond¬ 
ence  between  the  two  processes  is  complete.  Finally  it  now  follows 
that  the  number  of  relevant  documents  retrieved  from  this  subset 
has  a  Poisson  distribution  with  parameter  p1d1  where  dj  is  the 
number  of  documents  in  the  subset. 

Similarly,  there  is  a  process  with  the  same  type  of 
factors  operating  for  the  retrieval  of  nonrelevant  documents.  The 
number  of  those  that  are  selected  has  a  Poisson  distribution  with 
parameter  pg  de . 


*TTie  term  "relevant  description"  stands  for  "description  pertain¬ 
ing  to  a  relevant  document." 
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n!  ,  x  ,J\(n-x) 

=  (n-x)i  x  1  KX  J  lX* 

a  v 

Letting  p  =  ^  ,  and  1-p  =  q  =  ~  , 

we  finally  obtain  the  pmf  of  x,  given  n: 

Mi  i  ^  /nv  x  (n-x) 
f (x  I  n)=  (x)  p  q' 
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APraNDIX  C, — Derivation  of  the  Maximum  Likelihood  Estimates  for  p 

and  X(r)  Based  on  a  Series  of  Observations  of  (X,  n) . 


Suppose  we  observe  the  outcomes  cf  k  trials 


The  advantage  of  the  maximum  likelihooa  method  is  now  that 
the  distribution  of  the  need  not  be  known.  All  we  need  is  the 
Joint  pmf  for  all  observed  pairs  (X^,  n^) ,  i  =  1  ...  k.  Since  (6) 

holds  for  each  pair  (X^  n^),  we  have  for  the  Joint  pmf  for  all 
pairs 


H  = 


k 

IT  h(Xi#  n  ,  p) 
1=1  1 


Besides  p,  this  distribution  contains  X(r)  as  parameter.  Next, 
we  define  the  likelihood  function  t  a  In  H: 


L 


m  JT 


ni 

(v  )  +  EX. In  p 
xi  1 


-(Er^-  Zxpin  q  -kX(r)  +  Eniln  x(r) 


“  1"  {  TI 


Estimates  for  he  parameters  p  and  X(r)  can  now  be  de¬ 
termined  by  maximizing  L  with  regard  to  these  parameters.  We  ob- 

caln  p  by  solving  =0  for  p: 


dL  _ 
dp 


(En1  -  D^) 

1  -  p 


=  0 


A 

P 


k 

E.xA 


ifei 


E  "i 
i=l 


^(r)  is  obtained  in  a  similar  manner  by  solving  ~~r 
X(r) :  dX<r) 


*«  0  for 


dL 


dX(r) 


X(r) 


En 

”k  +  x(7)  *  0 

k 

r  n 

i-l  1 
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Appendix  D. — Derivation  of  the  Recall  Parameter  p(r) 

(1)  For  a  Single  Observation  of  (x,  r) . 

Let  v  -  and  p!  (r)  =  E(y). 

Substituting  Vr  for  x  in  (6)  we  obtain  the  joint  pmf  of  y  and  n 
My,  «;  p,  r)  .  Q  pVq"^%-X<r)[X  <r)f/„! 


1  2 

y  =■  Oj  ~f  —  n  =  yr ,  yr  +  1,  ... 

Summing  this  pmf  over  the  values  of  n  we  obtain  the  pmf  of  y. 


Yr  -Mr'x(r>* 


r-  T  *■  n 

v(y,  P,  r)  =  )  Uy,  n;  p,  r)  =  P — 1— 

nJ7r  <*'>' 


=  [X(r)p]VVX<r,e-X<rV(vr)! 


L  [n-yrjl 


n=yr 


=  [X(r)p]VVX<r)P/(Yrt! 


yr  =  o,  i,  ...  or  equivalently 
y  «=  o,  1/r,  2/r, ... 

Thus  yr  has  a  Poisson  distribution  with  moan  E[yr]  ■  X(r)p  and 
since  r  is  a  constant 

Pl<r)  -  E[y]  ■  ‘ 

r 

The  moment  generating  function  M^(t)  ry  i®  defined  by; 

m 

M_  (t)  ■  v.  (e0'1)  «  Y  eX>t  (V;  p,r) 


ry 


ry«0 


e-X(r  )p  £  ((Jt  x(r)p)*>/(rY)i 
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g-X(r)p  getX(r)p 
_  e-X(r)p[l-et] 

We  will  now  derive  p(r)  for  k  subsequent  inquiries  performed 
either  by  k  different  operators  on  one  question  or  by  one  operator 
on  k  different  questions  with  constant  r. 

(2)  Derivation  of  p(r)  for  k  Observations  of  (x,r) 

We  define  the  average  recall  ratio  for  k  runs  with 
constant  r  by 

k 

V  *  -E  and  p*(r/ " s[71 

i=l 

The  moment  generating  function  of  rk  y  is 

MrK^(t)  =  E(erW7t)  =  E  [e(Ej^i)1:] 


=  T[  E  [e(rYi)(t)]  =TT  M^(t) 

.r*  <t)?  -  e-kx(r)p[i“^ 


Since  the  moment  generating  function  is  unique,  rky  is  Poisson 
distributed  with  mean  kX(r)p 


Hence 


P  (r)  » 

i  r 

and  we  define  p(r)  «  p:  (r) 


=  Pi  (r) 


■  Mr> 


pX(r) 

r 


30 


APPENDIX  B,— -A  Second  Look  at  the  Problem 


(l )  Contingency  Tables  and  Conditional  Probabilities 

We  will  now  briefly  discuss  the  relationship  between  the 
model  developed  so  far  and  a  similar  approach*  recently  suggested, 
which  in  turn  makes  use  of  the  notions  and  concepts  introduced  by 
John  A.  Swots  (ref  8)  and  R.A.  Fairthome  (ref  9). 

Swets  arranges  the  important  variables  of  a  retrieval 
subsystem  in  a  2  x  2  contingency  table  with  attributes  R  for  re¬ 
trieved  and  p  for  pertinence  (here  synonymous  with  relevance). 


P:  Pertinent 

p:  Nonpertinent 

Totals: 

R:  Retrieved 

a 

b 

a  +  b 

R:  Nonretrieved 

c 

d 

c  +  d 

Totals: 

a  +  c 

b  +  d 

a  +  b  +  c  -f  d 

Here,  a,  b,  c,  and  d  denote  the  frequencies  oi  occurrence  of  the  four 
conjunctions;  e.g.,  a  is  the  number  of  pertinent  items  retrieved,  etc. 


The  following  table  shows  the  four  possible  conjunctions  of 
R,  R,  P,  P  that  can  be  derived  from  Figure  4  in  relation  to  the  four 
basic  retrieval  situations  and  their  conventional  designations: 


Conjunctions 

Retrieval  situation 

Conventional 

designation 

(a) 

(P-R) 

Pertinent,  retrieved 

hit 

(b) 

CP*R) 

Nonpertinent,  retrieved 

false  drop 

(c) 

(P»K) 

Pertinent,  nonretrieved 

miss 

(d) 

(P.R) 

Nonpertinent,  nonretrieved 

correct  rejection 

Based  on  these  situations,  Swots  defines  four  conditions  pro¬ 
babilities: 

Prp(R)  a  cond  prob,  for  a  pertinent  item  to  be  retrieved 

p*jrtR)  *  cond  prob,  for  a  nonpertinent  item  to  bo  retrieved 

Pr  CR)  *  cond  prob.  for  a  pertinent  item  to  be  missed 

P 

Arthur  D.  Little, personal  communication 
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Pr  -  (R)  =  cond  prob,  for  a  nonpertinent  item  to  be  missed 

They  can  be  estimated  by  the  following  functions  of  the  frequencies 
of  occurrence  a,  b,  c,  and  d: 


(a) 


a 

a  +  c 


estimates  Prp  (R) 


<b) 


b  +  d 


estimates 


Pr_(R> 


(O 


a  +  c 


estimates 


Pr  (R) 
P 


(d) 


b  +  d 


estimates 


PrjR> 

P 


The  conditional  probabilities  defined  by  Swets  are  not  ex¬ 
haustive;  additional  probabilities  are  definable  based  on  the  same 
set  of  four  retrieval  situations,  simply  by  reversing  the  sequence 
of  the  attributes  "P"  and  "R"  to  obtain  another  set  of  four  con¬ 
ditional  probabilities,  the  first  of  which  is  PrR(P)  =  "cond. 
pro b,  for  a  retrieved  document  to  be  pertinent"  is  identical  with 
our  precision  parameter  p.  In  the  following  section,  we  will 
introduce  a  second  probability  model  which  is  baseo  on  the  four 
conditional  probabilities  as  defined;  since  the  basic  parameter 
of  this  model  is  recall  "a"*  as  compared  with  precision  "p"  in 
the  first  model,  we  will  distinguish  the  two  by  denoting  them 
"Of-model,"  or  "p-wxJel" ;  respectively. 

(2)  The  a- Model 

Let  us  now  define 


Prp(R)  ■  a  (probability  of  a  hit,  recall) 

Pr— (R)  5  1-3  (probability  of  a  false  drop) 

P 

Pr  (R)  ■  1-3  (probability  of  a  miss) 

P 

Pj^d)  ■  0  (probability  for  correct  rejection), 

denote  the  number  of  relevant  documents  retrieved 


and  let  x. 


(in  response  to  the  ith  inquiry)  out  of  a  collect  ion  of  K  documents 
where  it  is  known  that  r  documents  are  relevant  to  the  inquiry. 


following  A.  D.  Little  notation 
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Further  let  y^  denote  J’he  number  of  nonrelevant  documents  retrieved; 
hence  H  x^  +  y^  demote  the  total  number  of  docunents  retrieved. 

With  these  definitions  and  under  the  assumption  that  sampling 
takes  place  with  replacemen i,  the  distribution  of  x^  is  binomial 
with  mean  Ctr^,  the  corresponding  pmf  being 


f(x,,  a)  =  (ri)  a  *i(i^r)ri'xi 
xi 


—  Oj  1,  2 ,  •  •  j  n 


g(yi>  (1-3))=  <J'ri)(i-$)yi  pN"q"yi;  y±  =  0, 1,2. .  .If-r; 
For  the  combined  density  of  x^  +  y^  =  n^,  we  have 

h(fti)  =  [f(xi  =  J)«  g(yi  =  (nA  -  J))] 

J=o 

with  mean 


l[n4]  =  ocri  +  (N  -  ri)(l-e) 

These  equations  together  with  the  definitions  and  assumptions 
listed  on  p.  32  ,  form  the  basis  for  the  Ct-aodel.  It  will  be  shown, 
however,  that  the  new  model  subsequently  called  a -model,  will  under 
certain  conditions  lead  to  the  same  results  as  the  p-model,  for: 

(a)  the  maximum  likelihood  estimate  for  a,  the  hit  probability 

of  the  or-model,  which  corresponds  *o  p(r),  the  recall  parameter  in  • 

the  p-model. 

(b)  the  pmf  for  the  observed  recall-ratio  x/r. 

The  conditions  arc  essentially  those  which  relate  the  binomial  pmf 
to  the  Poisson  pmf,  i.o.  for  luige  N,  large  r  and  small  a  such  that 
ra  remains  finite,  the  Poisson  pmf  closely  approximates  the  binomial, 

(3)  Relation  between  a- Model  and  p-Model 

(a)  The  Maximum  Likelihood  Estimate  of  the  Hlt-Prob- 
babl 1 lty  a 

The  Joint  pmf  v./  responses  x*,  i®l,  2,  ...k,  to  k 
inquiries  for  each  of  which  there  «rp  exactly  r  relevant  documents 
in  the  collection,  is  given  by  using 

k  h  r  *<  r~*i 

*<xj,  *2/  *  *  *  *k>  a>  ■  TT  ,<*)  *  TT  <_  )G  (1-Or) 

i-1  i«l  1 
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The  corresponding  likelihood  function  is 

k  k 

L  =  In  f(xj  ,  Of)  =  £  In  j  +  ^xi  In  a  +[kr-£xi]ln(l-a) 

i=l  1  i-1 

Hie  maximum  likelihood  estimate  of  C £,  say  QL }  is  that  value  of  a  that 
maximizes  L,  i.e.,  the  solution  to 


3  L 
3  a 


=  o. 


Hence 


3L  _ 

3a  a 


(kr-lxi) 
'  i-a  ~ 


=  0; 


and 


A 

a  - 


r  x 


i 


kr 


Evidently,  a  =  p(r),  the  estimate  for  the  recall  parameter  according 
to  the  "p"  model. 


(b) 


The 

x/r 


probability 
*  states  : 


mass  function  of  the  recall  ratio 


Introducing  z 


f(x)  =  <*)  </(i-a)r’x  ; 

X 

:  —  and  replacing  x  by  rz,  we  get 


x=0,  1,  2,  ...r. 


g(z)  =  (rrz>  a  r  Z(l-a)r  1 Z;  rz  =  0,  1,  2,  ...r 

Then,  for  large  r  and  small  a  such  that  rdf  remains  finite, 
it  can  be  shown  that  the  binomial  expression  for  g(z)  reduces  to 


g(z) 


(rz)i 


when  the  parameter  5  =  ra. 


rz  =  0,  1,  2  . . .r 


In  the  p-model,  we  had  arrived  at  the  same  result  except  that  the 
parameter  was  §  =  >/r)p.  (A^rj  and  p  as  previously  defined  for  the 
p  -  model) .  Hence  we  have 

ra  =  §  =  X(r)p. 


a  = 


A(r?p 

r 


=  p(r). 


j|( 

In  this  paragraph,  function  symbols  that  had  been  introduced  in 

previous  sections  are  used  again  with  a  different  meaning  as  defined 
here. 


34 


REFERENCES 


(1)  B.  Altmann,  The  Medium-Sized  Information  Service;  Its  Auto¬ 
mation  for  Retrieval,  HDL,  TR-1192,  30  Dec  63  (AD  429  242). 

(2)  B.  Altmann,  A  Multiple  Testing  of  the  ABC  Method  and  the 
Development  of  a  Second-Generation  Model,  Part  I,  Preliminary  Dis¬ 
cussions  of  Methodology,  Supplement:  Computer  Programs  of  the  HDL 
Information  Systems,  HDL,  TB-1295,  April  1965  (AD  617  118). 

(3)  B.  Altmann,  A  Multiple  Testing  of  the  ABC  Method  and  the 
Development  of  a  Second-Generation  Model,  Part  II,  HDL  TR-1256,  Oct  65 
(AD  625  924). 

(4)  R.  A.  Fisher;  Contributions  to  Mathematical  Statistics,  J.  Wiley 
Sons,  N.  Y.,  1950, 

(5)  A.  M.  llood;  Introduction  to  the  Theory  of  Statistics.  McGraw 
Hill  Book  Co.  Inc.,  New  York,  1950 

(6)  G.  P.  Wadsworth  and  J.  G,  Bryan;  Introduction  to  Probability 
and  Random  Variables,  McGraw  Hill  Book  Co.,  N.  Y.,  1960. 

(7)  Biometrics  Tables  for  Statisticians;  Vol  I.,  ed.  by  F.S.  Pearson 
and  H.  0.  Hartley,  Cambridge  Univ.  Press,  1956;  pp  75,  130. 

(8)  J,  A.  Swets;  Information  Retrieval  Systems,  Science  141,  1963, 
pp  245-250. 

(9)  R.  A.  Fairthome;  Basic  Parameters  of  Retrieval  Tests.  ADI  - 
Proceed,  1964,  Vol  I,  p.  343. 

Selected  Papers  and  Reports  on  Modeling  Techniques,  Espe¬ 
cially  as  Applied  to  the  Development,  Description  and  Performance 
Testing  of  JR  Systems  (with  Short  Annotations) 

(10)  J.  Vorhoeff  et  al ;  Mathematical  Models  in  Systems  Design  for 
Information  Retrieval.  Western  Reserve  University,  School  of 
Library  Science,  May  1961,  Contract  No,  AF  49(638)-357.  Contains 
a  description  of  a  probabilistic  model  for  ISR-Systems. 

(11)  Stephen  Pollock;  The  Normalized  "Sliding"  Rati©  Measure. 

Arthur  D.  Little,  Inc.,  Tech.  Note  CaCl  19  Jul  65.  Presents  a 
Mathematical  Model  of  an  IR-System,  for  which  generalized  recall 
and  relevance  are  derived  as  performance  measures. 

(12)  W.  Goffman,  V,  A.  Newill,  Methodology  for  Test  and  Evaluation 
of  Information  Retrieval  Systems.  Western  Reserve  Univ.,  School  of 
Library  Science,  Report  No.  CSR-TRI,  July  1964.  (AD  614  005). 
Sophisticated  performance  evaluation  measures  are  derived  for  an 
idealized  IR-system  which  is  described  using  mathematical  and  set 
theoretical  concepts  and  notation. 


35 


REFERENCES  (Continued) 


(13)  A.  Trachtenberg  et  al ;  An  Investigation  of  the  Techniques 
and  Concepts  of  Information  Retrieval.  ITT  Internal.  Electric 
Corp.,  Tech  Rep  No.  P-AA-TR-(0031),  Contr  No.  DA-36-039-SC-90787, 
Within  the  objectives  of  developing  a  theory  of  information  retrieval, 
a  preliminary  mathematical-probabilistic  model  of  an  ISR-system  is 
presented;  quantitative  measures  of  relatedness  are  established. 

(14)  W.  Karush,  On  Mathematical  Modeling  and  Research  in  Systems. 
System  Development  Corp.,  SP-1039,  Nov  1962.  General  treatise  on 
systems  modeling,  especially  on  mathematical  models. 

(15)  C.  P.  Bourne  et  al.;  Requirements,  Criteria  and  Measures  of 
Performance  of  ISR-Systems.  Stanford  Research  Inst.,  Dec  1961. 

SRI  Proj.  No.  3741  (AD  270  942).  Contains  description  of  a  general 
functional  model  (flow-chart  type)  of  an  IR-system;  discusses 
future  research  in  modeling  for  performance  evaluation. 

(16)  R.  P.  Heckman;  A  Method  for  Investigating  the  Behavior  of 
Attributes  which  Belong  to  ISR  Systems.  Georgia  Inst,  of  Ttech., 

Aug.  1965.  Master's  Thesis.  (AD  624  658).  Mathematical  statis¬ 
tical  model  for  ISR-system  is  developed  based  on  functional  re¬ 
lationships  between  ISR-system  attributes.  Statistical  analysis 
is  performed  on  data  from  a  representative  sample  of  ISR-systems. 

(17)  J.  M,  Hoffmann;  Experimental  Design  for  Measuring  the  Intra- 
and  Inter-Group  Consistency  of  Human  Judgement  of  Relevance. 

Georgia  Inst,  of  Tech.,  Aug.  1965,  Master's  Thesis.  (AD  620  342). 
Demonstrates  the  applicability  of  statistical  methods  to  the  evalua¬ 
tion  of  tests  of  relevance  assessment  consistency. 

(18)  C.  R.  Blunt;  An  Information  Retrieval  System  Model.  HRB- 

Slnger  Inc,,  October  1965.  Rep.  No,  352.  14-R-l.  (AD  623  590) 

A  computer  simulation  model  for  the  performance  evaluation  of 
intelligence-type  IR-3ystems. 

(19)  D.  P.  Votaw,  Jr.;  Statistical  Science  and  Information 
Technology;  in:  Proceed,  of  the  Second  Congress  of  the  Information 
System  Sciences.  Spartan  Books  Inc.,  Washington,  D.  C.  1965. 

Survey  of  the  application  of  various  statistical  methods  and  tools 
to  probJems  in  information  technology,  i.e.  operations  research, 
management  science,  systems  analysis,  etc. 

(20)  D.  R.  Swanson;  On  Indexing  Depth  and  Retrieval  Effectiveness; 
in:  Proceed,  of  the  Second  Congress  on  the  Information  System 
Sciences.  Spartan  Books  Inc.,  Washington,  D,  C,  1965.  Mathe¬ 
matical  statistical  model  for  the  evaluation  of  IR-system  perform¬ 
ance.  Derivation  of  an  analytical  expression  for  the  relation  of 
recall  and  relevance  to  indexing  depth.  Application  to  Cranfield 
results. 


REFERENCES  (Continued) 

(21)  J.  F.  Rial;  Results  of  Document  Retrieval  Experiment  on 

Mextrix  Searching.  Mitre  Corp.,  April  1964.  TM-03989.  Presents 

an  abstract  algebraic  model  for  document  retrieval. 

(22)  C.  R.  Conger;  The  Simulation  and  Evaluation  of  IR-Systems. 
HRB-Singer  Inc.,  April  1965.  Rep.  No.  352-R-17  (AD  464  619)  A 
simulation  model  to  study  response  time  aspects  of  computer-based 
lR-systems;  practical  applicability  limited. 

(23)  H.  Borko,  The  Conceptual  Foundations  of  Information  Systems. 
SDC,  May  1965,  Rep.  No.  SP-2057  (AD  615  718).  A  conceptual  model 
of  advanced  ISR-systems;  discusses  prospects  of  automated  indexing 
and  abstracting. 

(24)  J.  Marschal:,  K.  Miyasawa;  Economic  Comparability  of  Informa¬ 
tion  Systems.  UCLA,  Western  Management  Science  Inst.,  July  1965. 
Working  Paper  No.  85  (AD  619767) .  Models  based  on  decision  and 
utility  theory  for  the  comparative  evaluation  of  information  systems. 

(25)  D.  J.  Hillman;  Study  of  Theories  and  Models  of  Information 
Storage  and  Retrieval.  (Series  Title).  Lehigh  Univ.,  1962-1965. 
Research  supported  by  the  National  Science  Foundation.  Investigation 
on  various  related  problems,  including  a  Boolean  algebra  model  for 

an  IR-system,  a  graph-theoretical  treatment  of  relatedness  of 
documents  and  numerous  other  topics  relevant  to  modeling. 

(26)  R.  Jernlgan,  A.  G.  Dale;  Set-Theoretic  Models  for  Classifi¬ 
cation  and  Retrieval,  Univ.  of  Texas  Linguistics  Res.  Center,  Nov, 
1964,  LRC64-W1W-5.  Models  based  on  notions  and  axioms  from  lattice 
theory  and  topology  are  suggested  for  the  analysis  of  ISR-systems. 

(27)  G.  A.  Markel;  Toward  a  General  Methodology  for  Systems  Evalua¬ 
tion.  HRB-Singer,  Inc.,  July  1965.  Rep.  No  352-R-13  (AD  619  373). 
Contains  an  annotated  bibliography  of  literature  on  systems  modeling 
and  simulation.  See  also;  Rep.  No.  352.14-R-2,  April  66, 

(28)  A.  J.  Sailer;  Linear  Prediction  Models  for  a  Mechanized  In¬ 
formation  System,  University  of  Pittsburgh,  Master's  Thesis  1966 
(AD  481444)  Cont  AF  33(608)  -  1768. 


37 


UNCLASSIFIED 


Security  CUstiflcttion 


DOCUMENT  CONTROL  DATA  •  R  <■  D 

_ (Security  dm* title »t loo  ot  fill*,  body  of  •6«lrae(  mttd  Inditing  annotation  mutt  bo  oniorod  whon  tho  ovormtl  ropott  la  cloaalflod) 


1.  OniOtNATINO  ACTIVITY  (CttfpOtOf  autbot)  JlA.  RIRORT  IKCURITV  C  L  A  SRI  FIC  A  TION 

Harry  Diamond  Laboratories  |  UNCLASSIFIED 

Washington,  D.  C.  20438 


*  aaeoMT  titl« 


MULTIPLE  TEST  OF  ABC  METHOD  PART  I I I— MATHEMATICAL  MODEL 


I  (Typo  ot  t apart  and  tnchtahra  datoa) 


my  1967 


•m.  CONTRACT  OR  IRANT  MO. 


?A.  TOTAL  mo.  OF  FA  OKI  7b.  NO  OF  RK  FI 

44  28 


IA.  ORltINATOri  RKFORT  NUmIIRII) 


ft.  FROJKCT  NO.  DA-1L013001A91A 


TO- 1334 


Code:  5016.11.84400 


a  HDL  Proj.  No.  01200 


«l.  OtIT't.i RUTlOM  ITAT8MINT 


Ilft.  OTHIR  RKFORT  NOW  (Any  othor  rumboro  that  map  I 
Ml  rapart) 


Distribution  of  this  document  Is  unlimited 


II.  IF  ON  IO  RIM  A  MILITARY  ACTlVlTr 


rrrw.TT* 


The  report  suggests  a  method  of  constructing  a  mathematical  taodei  for  the  first 
test  of  the  ABC  Storage  and  Retrieval  Systems  and  calculates  »5-percent  confidence 
Intervals  for  relevance  and  recall  values. 


-1473 


MSkAca*  ee  vmm  i«t,  <  m.  *wcn  i* 
tMM.IT!  istrr 


UNCLASSIFIED 


